Data mining of user activity data to identify sequential item acquisition patterns

ABSTRACT

A data mining component collectively analyzes item acquisition histories of users of an electronic catalog of items and identifies pairs of catalog items that tend to be acquired sequentially. The data mining component may also generate data regarding such sequential item acquisition patterns. For example, the data mining component may determine whether user acquisitions of the two items tend to be spaced apart in time by a characterizing time interval, and/or may determine percentages of users who have followed particular sequential acquisition patterns. Information regarding the detected sequential item acquisition patterns may be exposed to users on electronic catalog pages, and/or may be used to select the timing with which particular items are recommended to users.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.10/945,547, filed Sep. 20, 2004 (the disclosure of which is herebyincorporated by reference), which is a continuation-in-part of U.S.application Ser. No. 10/864,288, filed Jun. 9, 2004.

FIELD OF THE INVENTION

The present invention relates to data mining algorithms for analyzingitem acquisition histories of users of an electronic catalog of items.

BACKGROUND OF THE INVENTION

A variety of technologies exist for collecting and mining user activitydata reflective of the actions and preferences of users of an electroniccatalog. For example, it is known in the art to collectively analyze theactivity data of a population of users to identify items that tend to beviewed, purchased, or otherwise selected in combination. Different typesof item relationships may be detected by applying different similarityalgorithms and metrics to the activity data. For instance, a pair ofitems, A and B, may be identified as likely substitutes on the basisthat a relatively large number of the users who view A also view Bduring the same browsing session. Items C and D, on the other hand, maybe identified as complementary because a relatively large number ofthose who purchase C also purchase D.

The item relationships extracted from the user activity data may beexposed to users of the electronic catalog to assist users inidentifying items of interest. For example, in some systems, when a userviews a catalog item, the user is informed of other items that arecommonly viewed (or purchased) by those who have viewed (or purchased)the item. Although this type of data is helpful, users could benefitfrom knowing more about the relationships that exist between specificitems.

SUMMARY

The present invention comprises data mining methods for analyzing useractivity data associated with an electronic catalog of items to generatevarious types of item relationship data. The item relationship data maybe presented in the electronic catalog to assisting users in makinginformed item selection decisions, and/or may be used to recommendspecific items to users. The invention may be embodied within any typeof electronic catalog system (web site, online services network,multi-site “mall” system, etc.) in which users can select catalog itemsto purchase, rent, download, or otherwise acquire.

In one embodiment, a data mining component collectively analyzes itemacquisition histories of users of an electronic catalog of items toidentify pairs of items that tend to be acquired sequentially. The datamining component may also determine, for each such item pair, whetheruser acquisitions of the two items tend to be spaced apart in time by acharacterizing time interval. In addition, the data mining component maycalculate one or more conditional probability values reflective of thefrequencies with which users who acquire the first item in the pairacquire the second item after waiting for a particular interval of time.

The item relationship data extracted by the data mining component may beused to supplement item detail pages, or other pages of the electroniccatalog, with information that assists users in selecting items toacquire. For instance, the detail page for a particular item, item A,may be supplemented with a list of other items that are frequentlypurchased a particular amount or interval of time, such as three to fivemonths, after acquiring item A. This list may also include associatedconditional probability values, which may be expressed as percentages.For instance, the detail page for item A may indicate that 40% of theusers who acquired item A acquired item B five or more months later.

The extracted item relationship data may additionally or alternativelybe used to select items to recommend to users at specific points intime. For instance, if it is determined that a relatively largepercentage of the users who acquire item C acquire item D approximatelyfive months later, item D may be recommended to users who acquired itemC five months ago and have not yet acquired item D. The recommendationsmay be provided by email, customized web pages, and/or othercommunications methods.

Neither this summary nor the following detailed description purports todefine the invention. The invention is defined by the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a web site system according to one embodiment of theinvention.

FIG. 2A illustrates an example dataset that reveals the existence of acharacterizing time interval for a particular pair of items that tend tobe acquired by users in a particular order.

FIG. 2B illustrates an example dataset that does not reveal theexistence of a characterizing time interval.

FIG. 3 illustrates one example of an item detail page that may begenerated and provided to users to convey additional information aboutrelationships between specific items.

FIG. 4, which consists of FIGS. 4A and 4B, illustrates one example of adata mining method that may be used to generate a sequential-acquisitionpattern table of the type shown in FIG. 1, and which may be used toidentify item relationship data of the type shown in FIG. 3.

FIG. 5 illustrates one example of a method that may be used to generateitem recommendations based on sequential pattern acquisition data minedfrom user activity data.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

Specific embodiments of the invention will now be described withreference to the drawings. These embodiments are intended to illustrate,and not limit, the present invention. For example, although the specificembodiments described herein involve the generation and display of dataregarding item purchase events, the invention is also applicable toother types of item acquisition actions, including rentals, licenses anddownloads of items.

FIG. 1 illustrates a web server system 30 according to one embodiment ofthe invention. The web server system 30 includes a web server 32 thatgenerates and serves pages of a host web site to computing devices 34 ofend users. Although depicted as desktop computers, the computing devices34 may include a variety of other types of devices, such as cellulartelephones and Personal Digital Assistants (PDAs). The web server 32 maybe implemented as a single physical server or a collection of physicalservers. The invention may alternatively be embodied in another type ofmulti-user, interactive system, such as an interactive televisionsystem, an online services network, or a telephone-based system in whichusers select items to acquire via telephone keypad entries and/or voice.

The web server 32 provides user access to an electronic catalog of itemsrepresented within database 36 or a collection of databases. An itemacquisition processing component 33 that runs on, or in associationwith, the web server 32 provides functionality for users to place ordersfor catalog items they wish to acquire. The items represented in thedatabase 36 may include or consist of items that may be purchased,rented, licensed, or otherwise acquired via the web site (e.g., consumerelectronics products; household appliances; book, music and video titlesin physical or downloadable form; magazine subscriptions, etc.). In oneembodiment, the items consist primarily or exclusively of physicalproducts that may be purchased via the web site. Many hundreds ofthousands or millions of different items may be represented in thedatabase 36. As is conventional, the items may be arranged within ahierarchy of browse categories to facilitate navigation of the catalog.

In one embodiment, detailed information about each item may be obtainedvia the web site by accessing the item's detail page within theelectronic catalog (see example item detail page shown in FIG. 3). Eachitem detail page may be located by, for example, conducting a search forthe item via a search engine of the web site, or by selecting the itemfrom a browse tree listing. Each item detail page may provide an optionfor the user to acquire the item from a retail entity and/or fromanother user of the system. The web server 32 may generate the itemdetail pages, and other pages of the web site, dynamically using arepository of web page templates 38.

As illustrated in FIG. 1, the web server system 30 maintains itemacquisition histories 40 for each user of the web site. The itemacquisition history 40 of each user identifies all of the catalog itemspurchased or otherwise acquired by that user via the web site, togetherwith the associated dates of acquisition. Depending upon the nature andpurpose of the web site, the item acquisition histories may, forexample, be item purchase histories, item rental histories, itemdownload histories, or a combination thereof. In some embodiments, theitem acquisition histories 40 may include data obtained from externalsources, such as the web site systems of business partners. Itemacquisition histories 40 of many hundreds of thousands or millions ofunique users may be maintained and analyzed by the system 30. Each useraccount may be treated as a separate user for purposes of maintainingitem acquisition histories; thus for example, if members of a householdshare a single account, they may be treated as a single user.

As further illustrated in FIG. 1, a data mining component 44periodically and collectively analyzes or “mines” the item acquisitionhistories of the users to generate a “sequential-acquisition pattern”table 46. Each entry (depicted as a row) of this table 46 identifies apair of items that, based on a collective, computer-based analysis ofthe item acquisition histories 40 of users, tend to be acquiredsequentially in the order indicated (i.e., first item followed by seconditem). For example, the first row of this table 46 indicates that arelatively large portion of the users who acquired item A thereafteracquired item D. This type of relationship may exist where, for example,the second item in the pair (i.e., the later-acquired item) is anaccessory for, a replacement part for, or a sequel to, the first item inthe pair.

In the example shown in FIG. 1, some of the table entries also includedata indicating a characterizing time interval between acquisitions ofthe first and second items of the pair. For example, the first entry inthe table 46 indicates that users who have acquired item D afteracquiring item A have typically done so three to four months afteracquiring item A. As illustrated by this example, the characterizingtime intervals may optionally be in the form of bounded ranges, such as“3 to 4 months.” As depicted by the second entry in the table 46, rangesthat are unbounded at the upper end, such as 7+ months (meaning “atleast seven months”), may additionally or alternatively be used.

The characterizing time interval for a given pair of items may bedetermined by, for example, analyzing data sets of the type depicted inFIGS. 2A and 2B, each of which corresponds to a particular, hypotheticalpair of items. Each such graph illustrates how long users typically waitbefore acquiring the second item in the pair once they have acquired thefirst item. The graph shown in FIG. 2A reveals a characterizing timeinterval of approximately three to four months because a statisticallysignificant peak exists in the acquisition count (number of users) forthis time interval bin. The graph shown in FIG. 2B, on the other hand,reveals a scenario in which a characterizing time interval does notclearly exists. For some item pairs, such as those in which the seconditem is ordinarily replaced each time it is consumed, multipleequally-spaced peaks may appear in the graph. In such scenarios, thepeaks occurring after the initial peak may optionally be ignored (i.e.,the characterizing time interval may be based solely on the initialpurchase of the second item).

In one embodiment, table entries are created only for those item pairsfor which a characterizing time interval is detected. To reduce theeffect of product release dates, a given item pair may be excluded fromthe table 46 if the only characterizing time interval detected isapproximately equal to the interval of time between the release of thefirst item and the release of the second item. The effects of productrelease dates may also be reduced by excluding from consideration itemacquisition events that occurred shortly after (e.g., within one weekof) the release of the corresponding item. Examples of algorithms thatmay be used to detect characterizing time intervals are discussed below.

The invention may, in some embodiments, be practiced without detectingcharacterizing time intervals. In addition, the characterizing timeintervals may be detected and represented using methods other than thosedepicted in the drawings. For instance, although FIGS. 2A and 2B implythe use of binning of time intervals for purposes of the analysis, thecharacterizing time intervals may alternatively be detected without theuse of binning. In addition, although FIG. 1 illustrates examples inwhich the characterizing time intervals are stored as ranges (e.g., “3to 4 months” or “7+ months”), the characterizing time intervals mayadditionally or alternatively be detected and stored in other forms,such as a single value (e.g. “25 weeks”) representing, e.g., theaverage, minimum, or maximum amount of time users typically wait beforeacquiring the second item.

As further illustrated in FIG. 1, the sequential-acquisition patterntable 46 may also store “conditional probability” values for some or allof the pairs of items represented therein. These values generallyreflect a frequency with which users who have acquired the first itemhave thereafter acquired the second item. Any of a variety of differentmethods may be used to calculate the conditional probability values. Forexample, where a characterizing time interval has been detected for agiven pair of items, a conditional probability value may be calculatedthat corresponds to this characterizing time interval. Thus, forexample, the first entry in the table 46 (FIG. 1) may indicate that 23%of the users who purchased item A purchased item D three to four monthslater; and the second entry may indicate that 78% of the users whopurchased item A purchased item X seven or more months later.

Conditional probability values may additionally or alternatively becalculated without regard to characterizing time intervals. For example,the third table entry in FIG. 1, which does not include a characterizingtime interval, may indicate that 33% of the users who purchased item Bthereafter purchased item E. Two or more different types of conditionalprobability values may be calculated, and stored in the table 46, for agiven pair of items (e.g. one value which is tied to a characterizingtime interval, and one which is not). The invention may alternatively bepracticed, in some embodiments, without the calculation or use ofconditional probability data.

As illustrated in FIG. 1, the data mining component 44 may be configuredin the illustrated embodiment by setting or adjusting a set of datamining parameters 50. One such parameter is a look-back period thatspecifies the time window of item acquisition events to be considered.For example, if a look-back period of three years is used, the table 46will be generated by analyzing item acquisition events that occurredover the last three years from the current date. Different look-backperiods may be used for different categories of items, and/or to detectdifferent types of relationships. The other data mining parametersdepicted in FIG. 1 are discussed below in connection with FIG. 4.

FIG. 3 illustrates some of the different ways the table data for an itemcan be incorporated into the item's detail page of the electroniccatalog to assist users in making informed item selection decisions. Inthis example, the item featured on the page is the Canon i560 DesktopPhoto Printer. In addition to providing functionality for users toselect this item for purchase, the page includes the following: (a) arelated items section 60 which lists items that are commonly purchasedby users 3-5 months after purchasing the featured item, (b) a relateditems section 62 listing items that are commonly purchased six or moremonths after purchasing the featured item, and (c) a related itemssection 64 which lists items that are commonly purchased by users whohave already purchased the featured item. Each item in these sections60-64 is displayed as a hyperlink to the respective item's detail pagein the electronic catalog.

Although three different related items sections 60-64 are shown forpurposes of illustration, any one or more of the sections may beomitted, and any two or more may be combined. In addition, although theitem relationship data is presented on an item detail page in thisexample, it can be conveyed to users via email messages, other types ofcatalog pages, or any other method.

In the example shown in FIG. 3, each item in the related items sections60-64 is displayed together with a corresponding conditional probabilityvalue that indicates a measure of the frequency with which users whohave purchased the featured item have thereafter purchased therespective related item. For example, the page indicates that 20% of theusers who acquired the Canon i560 Desktop Photo Printer acquired a CanonBCI-6Y Yellow Ink Tank three to five months later. This data value (20%)assists viewers of the page in assessing the strength of the time-basedrelationship between the two items. The conditional probability valuesmay also be helpful for selecting between items that are substitutes.For example, a user may use the data values provided in section 62 ofthe page to select between two alternative black ink tank products.

Although conditional probability values are illustrated in FIG. 3, theymay alternatively be omitted, or may be presented in another form (e.g.,graphically using charts, graphs, icons, or color coding). In addition,to further assist users in identifying relationships between specificitems, charts of the type shown in FIGS. 2A and 2B may be exposed tousers via the catalog.

Related items sections 60-64 of the type shown in FIG. 3 may begenerated automatically using data read from the table 46. For example,to generate the related items sections 60-64 shown in FIG. 3, all tableentries for which the Canon i560 is listed as the first item mayinitially be retrieved. These table entries may then be grouped suchthat those with a common characterizing time interval (or with nocharacterizing time interval) are grouped together. Finally, within eachsuch group, the items may be ordered for display from highest to lowestconditional probability. The task of generating the related itemssections 60-64 may be performed dynamically by the web server 32 inresponse to page requests from the user devices 34, such that updates tothe table 46 are immediately reflected in newly generated web pages.Alternatively, the sections 60-64 may be incorporated into thesemi-static content of the item detail pages until new table databecomes available.

FIG. 4, which consists of FIGS. 4A and 4B, illustrates an exampleprocess (sequence of steps) that may be performed by the data miningcomponent 44 of FIG. 1 to generate the table 46. This process may berepeated periodically (e.g., once a week) to update or regenerate thetable 46 so that the table data reflects the most recent set of useractivity data. Typically, the table 46 will be generated based on thepurchase actions, or other acquisition actions, of many thousands,hundreds of thousands, or millions of users. For purposes of generatingthe table 46 and displaying item relationship data, different versionsof a given product or work may be treated as the same item. Examplesinclude hardcover and paperback versions of the same book title, videoand DVD versions of the same movie title, CD and tape versions of thesame music title, and different releases or editions of a particularproduct.

The process depicted in FIG. 4 makes use of a predefined “formeracquisition pool” and a predefined “new acquisition pool.” The formeracquisition pool specifies the items that can serve as a “first item” inthe table 46 of FIG. 1. The new acquisition pool specifies the itemsthat can serve as a “second item” in the table 46. The pools may bedefined so as to limit the types of relationships that are detected. Forexample, by using a former acquisition pool consisting of non-consumableitems and a new acquisition pool consisting of consumable items, itemrelationships may be detected in which the second item is usually areplacement part for the first item. By allowing all types of items toappear in the former and new acquisition pools, a wider range of itemrelationships may be detected. The pool definitions and the look-backperiod may be selected in conjunction such that the pools consist ofitems that have been available throughout the entire look-back period.

In step 70 of FIG. 4, the item acquisition histories of all users of thesystem are retrieved, excluding any item acquisition events precedingthe look-back period. Any appropriate look-back period may be used, suchas six months, one year, three years, or infinity. In step 72, one ofthe users is selected as the “current user.” In step 74, the first(least recently acquired) item in the retrieved item acquisition historyof the current user is selected as the “current item.”

In step 76, if the current item is in the former acquisition pool, itstotal acquisition count is incremented by one. At the end of the processof FIG. 4, each item in the former acquisition pool has a totalacquisition count equal to the number of times that item was acquiredduring the look-back period. Multiple acquisitions of an item by asingle user may optionally be treated as a single acquisition of theitem, such that this count value represents the number of uniqueacquirers of the item. The total acquisition counts may be maintained ina temporary table of the type depicted in Table 1 below.

As depicted by blocks 78 and 80, if the current item is in the newacquisition pool, the process identifies the former-acquisition-poolitems, if any, acquired by the current user more than MinT beforeacquiring the current item. The parameter MinT is a minimum timeinterval, such as one month or three months, that may optionally be usedto exclude from consideration item purchase events that are close intime. For each former-acquisition-pool item identified in block 80, acount value is incremented for the corresponding tuple {prior item,current item, time interval bin}, where “time interval bin” is a rangeor bin of possible time durations between the two acquisition events.For example, if the current user acquired item 1 (aformer-acquisition-pool item), and acquired item 2 (anew-acquisition-pool item) three months and ten days later, the tuple{item 1, item 2, 3-4 months} would be incremented, assuming timeinterval bins with a width of one month are used. The tuple counts maybe maintained in a temporary tuples table (see Table 2 below, whichshows tuple entries for a specific pair of items). If a particular tupledoes not already exist in the temporary tuples table in step 80, it maybe added.

As depicted by blocks 82-88, steps 76-80 are repeated for eachadditional item in the current user's acquisition history until theentire acquisition history is processed. The acquisition history of eachadditional user is then analyzed in the same manner until all of theretrieved item acquisition histories have been fully processed.

Tables 1 and 2 below illustrate example count values that may begenerated for a given pair of items as the result of steps 76 and 80 ofFIG. 4. Each row in Table 2 corresponds to a respective tuple, with eachtuple corresponding to a respective time interval bin. In this example,the following parameters are used to define the time interval bins:MinT=0 (i.e., the first bin starts at time zero), BinW=3 months (i.e.,bounded bins have a width of three months), and MaxT=15 months (i.e.,the last bin, which is unbounded, begins at 15 months). These parametersmay be adjusted to increase and decrease the granularity and scope ofthe analysis. Various other types of parameters may additionally oralternatively be used to control the data mining process.

TABLE 1 Total Acquisition Counts Table Total Acquisition Item CountLinksys BEFSR41 (Wired Router) 7055 Linksys WRT54G (Wireless-G Router)5145 . . . . . .

TABLE 2 Tuple Counts Table First item Second item Time Interval BinCount Linksys BEFSR41 Linksys WRT54G 0-3 months 72 Linksys BEFSR41Linksys WRT54G 3-6 months 325 Linksys BEFSR41 Linksys WRT54G 6-9 months552 Linksys BEFSR41 Linksys WRT54G 9-12 months 884 Linksys BEFSR41Linksys WRT54G 12-15 months 640 Linksys BEFSR41 Linksys WRT54G 15+months 1243 . . . . . . . . . . . .

From the data maintained in these temporary tables, a variety ofdifferent conditional probability values can be calculated. For example,the table data reveals that, of those users who acquired a LinksysBEFSR41 Wired Router:

-   -   884/7055, or 13%, acquired the Linksys WRT54G Wireless-G Router        nine to twelve months later;    -   (884+640+1243)/7055, or 39%, acquired the Linksys Wireless-G        Router nine or more months later; and    -   (72+325+552+884+640+1243)/7055, or 53%, acquired the Linksys        Wireless-G Router at some point after acquiring the wired        router.

These values are referred to as “conditional probability” values, asthey generally represent conditional probabilities that a user willacquire the second item if the user acquires the first item. Multipledifferent conditional probability values may be calculated, and storedin the table 46, for a given pair of items.

Table 2 in this example also reveals that users typically acquire thesecond item about 9-12 months after acquiring the first item. Thus, thecount values in Table 2 may be used to identify a characterizing timeinterval.

Although not depicted by the above examples, the method used tocalculate the conditional probability values may discount or disregardthe most recent acquisitions of former-acquisition-pool items, sinceusers who made these acquisitions may still acquire one or morenew-acquisition-pool items within the relevant time periods. Thus, forexample, the calculation 884/7055 above may be changed to 844/(7055−R),where R represents the number of users who have purchased the BEFSR41Wired Router within the last nine months.

FIG. 4B illustrates additional steps that may be performed to build thesequential-acquisition pattern table 46 of FIG. 1 using the data storedin the two temporary tables. In block 90, the tuple counts table (Table2) is filtered to remove all item pairs for which either (a) aninsufficient number of the users who acquired the first item thereafteracquired the second item (e.g., less than 5%), or (b) the count valuesare too low to generate statistically reliable results. Thus, forexample, if the item pair depicted in Table 2 did not meet theseconditions, the six corresponding tuple entries (rows) of this tablewould be discarded. Typically, most of the tuples represented in thetuple counts table will be discarded as the result of this step 90. Thetuple counts table may additionally or alternatively be filtered byusing a randomization-test method to calculate probabilities thatspecific item pairs appear by chance, and by using the resultingprobability values to select the item pairs to be retained. Specificexamples of randomization tests that may be used are describedseparately below.

As depicted by blocks 92-98, the process then analyzes the table data ofthe remaining item pairs (those that have not been filtered out) todetermine whether a characterizing time interval exists, and tocalculate one or more conditional probability values for the item pair.The characterizing time intervals may be identified using a limit testthat compares the count values for each of the time interval bins. Forexample, a given time interval bin, such as the bin 9-12 months in Table2, may be treated as the characterizing time interval for the item pairif the count value for this bin/tuple both (a) represents at least 10%of the sum of the tuple count values for this item pair, and (b) is thehighest tuple count value for any three-month bin for this item pair.Other types of algorithms, such as a randomization-test algorithm, mayadditionally or alternatively be used to detect and identifycharacterizing time intervals (see description below).

As depicted in block 98 of FIG. 4, the attributes extracted in steps 94and 96 are used to create a corresponding entry in thesequential-acquisition pattern table 46 of FIG. 1. For instance, for theexample data in Tables 1 and 2 above, the following entry may becreated:

first item: Linksys BEFSR41

second item: Linksys WRT54G

characterizing time interval: 9-12 months conditional probability forcharacterizing time interval: 13% conditional probability forcharacterizing time interval and beyond: 39%

overall conditional probability: 53%

Referring again to FIG. 1, the data stored in the table 46 mayadditionally or alternatively be used by a “sequential pattern basedrecommendations” program module 35 to recommend specific catalog itemsto users based on the item acquisition histories of such users. Forexample, in connection with the first table entry in FIG. 1, item D maybe recommended to a user that purchased item A three to four months agoand has not yet purchased item D. The recommendation may be made via anemail communication, a personalized web page, or any other communicationmethod, and may include a message explaining why the recommendation isbeing made. For example, a message of the following format may betransmitted to a purchaser of a Canon i560 approximately three monthsafter the purchase date: “It has been three months since you purchasedthe Canon i560 Desktop photo printer. We thought you might like to knowthat users who have purchased this item have purchased the followingitems three to five months later: 20% bought the Canon BCI-6Y Yellow InkTank, 19% bought . . . .”

FIG. 5 illustrates an example of a process that may be embodied withinthe recommendations component 35, and executed on a daily basis, toprovide such recommendations. It is assumed in this example that eachentry in the table 46 specifies a characterizing time interval in termsof months, and that the recommendations are provided by email. In step110, the first entry in the table 46 is selected as the current tableentry. In step 112, the item acquisition histories of all users arechecked to identify all users (if any) that both (a) acquired the firstitem in the current table entry exactly M months ago, where M is thelower bound of the corresponding characterizing time interval, and (b)have not yet acquired the second item in this entry. For each useridentified in step 112, a respective entry is created in a temporarytable in step 114 with the ID of the first item, the ID of the seconditem, and the ID of the user.

As depicted by blocks 116 and 118, steps 112 and 114 are then repeatedfor each additional entry in the sequential-acquisition pattern table46. Finally, in step 120, the temporary table entries are aggregated byuser ID so that each user receives only a single email message (whichmay include multiple recommendations, and may be based on more than oneprior acquisition by the corresponding user), and the email messages aresent to the users. The results may alternatively be presented on apersonalized web page the next time the user visits the web site.

Use of Randomization Tests to Evaluate Relationships

In addition or as an alternative to using limit tests, the data miningcomponent 44 may use one or more different types of randomization teststo evaluate the strengths of the relationships between specific items.Consider the following statement: users who acquire item X are morelikely to acquire item Y at time Z. To evaluate this statement, thefollowing variables may be defined:

A is the set of all former acquisition pool items—those that can beplugged in for X;

B is the set of all new purchase pool items—those that can be plugged infor Y;

T is the set of time values, or time interval bins values, that can beplugged in for Z;

‘a’ is an element in A;

‘b’ is an element in B; and

‘t’ is an element in T.

If we say that users have a propensity to acquire ‘b’ ‘t’ units of timeafter acquiring ‘a’, we are saying that p(‘b’|‘a’, ‘t’) is greater thanp(X|Y, Z) for arbitrary values of X, Y and Z.

In one embodiment, the data mining component 44 uses the tuple countstable (the general format of which is shown above in Table 2) to testfor the existence of two different types of relationships: (R1) whetherusers are more likely to acquire ‘11’ after acquiring ‘a’ in general;and (R2) whether users are more likely to acquire ‘11’ a specific timeinterval range after acquiring ‘a’. Item pairs that do not exhibit atleast one of these two types of relationships, R1 or R2, can be excludedfrom the sequential-acquisition pattern table 46. The type or types ofrelationships that exist for a given item pair may also be recorded inthis table 46 and reflected on item detail pages.

In one embodiment, the data mining component 44 tests for the existenceof relationships R1 and R2 using the Bootstrap method, which is a typeof randomization test. The Bootstrap method is a well-known statisticalanalysis method that uses randomization to test the reliability of a setof data, or an inference drawn therefrom, and is described in “AnIntroduction to the Bootstrap” by Bradley Efron and Robert J.Tibshirani, published 1994 by Chapman & Hall/CRC (ISBN: 0412042312), thedisclosure of which is hereby incorporated by reference. The followingis one example of a Bootstrap procedure that may be used by the datamining component 44 to test for relationship R1:

-   -   1. Form a sample pool of items that appear in the second column        of the tuple counts table (i.e., the column for “new acquisition        pool” items). Include N units of each such item in the sample        pool, where N is the total number of times that item was        acquired as a second acquisition, as reflected in the tuple        counts table. (Note that N may be determined for a given item by        summing the count values of all rows in which that item appears        in the second column.)    -   2. Select an item from column one of the tuple counts table        (i.e., the column for “former acquisition pool” items), and for        each time that item was acquired as a first acquisition,        randomly select an item from the sample pool, with replacement.        to form a new pair. At the end of this random assignment        procedure, sum the number of occurrences for each pair of items        to obtain a view of possible random association for that pair.    -   3. Repeat #2 many times (e.g. 500-5000 times) to generate a        distribution for the association counts for random assignments.    -   4. Use this distribution to estimate a pvalue, the probability        that a given association is purely due to chance.

For example, suppose the total count values reflected in the tuplecounts table are as follows:

a b Count C X 100 C Y 10 C Z 5 D X 6 D Y 210 D Z 50

In this simple example, the sample pool generated in step 1 wouldconsist of 106 units of X, 220 units of Y, and 55 units of Z. In step 2we would take each item that occurs in column 1 (namely C and D), andfor each time that item was acquired as a first acquisition, randomlyselect an item from the sample pool, with replacement, to form an itempair (a, b). Thus, 100+10+5=115 item pairs of the form (C, ?) would becreated, and 6+210+50=266 item pairs of the form (D, ?) would becreated, where the question marks represent items selected, withreplacement, from the sample pool. The results of this random assignmentprocedure may, for example, render the following resampling:

a b Count C X 33 C Y 64 C Z 18 D X 71 D Y 159 D Z 36

This data set represents one snapshot of what random associationsbetween these items might look like. By repeating this process manytimes (step 3 above), a distribution for the association counts can beobtained and used to estimate the pvalues (step 4).

Small pvalues indicate a high likelihood of a real effect rather than arandom artifact. Accordingly, the data mining component 44 can use apvalue threshold to determine whether relationship R1 exists for a givenitem pair. For example, a pvalue threshold of 5% may be used,corresponding to a 95% confidence level; or a threshold of 1% may beused, corresponding to a 99% confidence level. The result of thisanalysis tells us whether the pairing of ‘a’ and ‘b’ in the tuple countstable is due to random occurrence, or whether users actually have apropensity to acquire ‘b’ after ‘a’.

To test for relationship R2, the data mining component 44 may use one orboth of the following methods. The first method is to use a Chi Squaredtest against a uniform distribution across all separation values (column3 of the tuples table). This method is less computationally intensive,but makes the potentially-erroneous assumption that the comparisondistribution is uniform.

The second method to test for R2 is to gather all rows of the tuplecounts table that correspond to particular values for columns 1 and 2,then sum all of the counts for column 4, then randomly assign values tocolumn 4 from this sum. (Any rows that correspond to open-ended timeinterval bins may be ignored.) At the end of this random assignment, thesum of column 4 for all of the rows will be the same as the originalsum, but with a different distribution. Again repeating this process andusing the Bootstrap method, we can determine the likelihood that theoriginal distribution was due only to chance.

For a given pair of items, the outcome of the tests for relationship R2,for different time interval values or bins, can be used to determinewhether a characterizing time interval (or time interval bin) exists forthat pair.

The various functional components described herein, including the datamining component 44, the web server 32, the item acquisition processingcomponent 33, and the sequential pattern based recommendations component35, may be implemented in software executed by one or more generalpurposes computers. The various data elements depicted in FIG. 1,including web page templates, the catalog item data, the itemacquisition histories, the sequential-acquisition pattern table 46, andthe data mining parameters, may be stored in one or more databases,and/or other types of data repositories, using any type or types ofcomputer storage, including but not limited to hard disk drive storage,solid state volatile and non-volatile storage, and tape drives. Thesequential-acquisition pattern table 46 may be implemented using anydata structure, or combination structures, that can be used to look upthe item relationship data associated with a given item.

Other features and components that may be included in theabove-described web server system 30 are described in the following U.S.patent documents, the disclosures of which are hereby incorporated byreference: U.S. Pub. No. US 2002/0019763 A1, published Feb. 14, 2002,and U.S. patent application Ser. No. 10/864,288, filed Jun. 9, 2004.

As will be apparent, the features and attributes of the specificembodiments disclosed above may be combined in different ways to formadditional embodiments, all of which fall within the scope of thepresent disclosure.

Although this invention has been described in terms of certain preferredembodiments and applications, other embodiments and applications thatare apparent to those of ordinary skill in the art, includingembodiments which do not provide all of the features and advantages setforth herein, are also within the scope of this invention. Accordingly,the scope of the present invention is defined only by reference to theappended claims, which are to be construed without reference to anydefinitions that may be explicitly or implicitly set forth in theincorporated-by-reference materials.

1. A data mining method, comprising: storing, in computer storage, itemacquisition data of users of an electronic catalog of items, said itemacquisition data including information reflective of timings of itemacquisition events, said electronic catalog including item detail pagesthat correspond to particular catalog items; detecting, based on ananalysis of the item acquisition data by a computer system, a sequentialitem acquisition pattern in which users who acquire a first catalog itemtend to subsequently acquire a second catalog item; and causing anindication of the sequential item acquisition pattern to be incorporatedinto an item detail page for the first catalog item, to thereby exposean existence of the sequential item acquisition pattern to users of theelectronic catalog.
 2. The method of claim 1, further comprisinggenerating, by the computer system, based on the stored item acquisitiondata, statistical data regarding the sequential item acquisitionpattern, and causing said statistical data to be incorporated into theitem detail page with the indication of the sequential item acquisitionpattern.
 3. The method of claim 2, wherein the statistical datacomprises a representation of an amount of time users typically wait toacquire the second catalog item after acquiring the first catalog item.4. The method of claim 2, wherein the statistical data comprises dataregarding what percentage of users who acquire the first catalog itemsubsequently acquire the second catalog item.
 5. The method of claim 4,wherein the percentage is tied to a bounded time interval range.
 6. Themethod of claim 2, wherein generating the statistical data comprisesdetermining, based on an analysis of time intervals between useracquisitions of the first and second catalog items, whether acharacterizing time interval exists that represents a typical amount oftime users wait to acquire the second catalog item after acquiring thefirst catalog item.
 7. The method of claim 1, wherein the itemacquisitions are item purchases.
 8. Non-transitory computer storage thatstores executable program code that directs a computer system comprisingone or more computers to perform a process that comprises: storing, incomputer storage, data regarding item acquisitions of users of anelectronic catalog of items, said data including information reflectiveof timings of item acquisition events; detecting, based on an analysisof the stored data regarding item acquisitions, a sequential itemacquisition pattern in which users who acquire a first catalog itemsubsequently acquire a second catalog item; and causing an indication ofthe sequential item acquisition pattern to be incorporated into anelectronic catalog page associated with the first catalog item, tothereby expose an existence of the sequential item acquisition patternto users of the electronic catalog.
 9. The non-transitory computerstorage of claim 8, wherein the process further comprises generating,based on the stored data regarding item acquisitions, statistical dataregarding the sequential item acquisition pattern, and causing saidstatistical data to be incorporated into the electronic catalog page inassociation with the indication of the sequential item acquisitionpattern.
 10. The non-transitory computer storage of claim 9, wherein thestatistical data comprises a representation of an amount of time userstypically wait to acquire the second catalog item after acquiring thefirst catalog item.
 11. The non-transitory computer storage of claim 9,wherein the statistical data comprises data regarding what percentage ofusers who have acquired the first catalog item have subsequentlyacquired the second catalog item.
 12. The non-transitory computerstorage of claim 9, wherein generating the statistical data comprisesdetermining, based on an analysis of time intervals between useracquisitions of the first and second catalog items, whether acharacterizing time interval exists that represents a typical amount oftime users wait to acquire the second catalog item after acquiring thefirst catalog item.
 13. The non-transitory computer storage of claim 8,wherein the item acquisitions are item purchases.
 14. The non-transitorycomputer storage of claim 8, in combination with the computer system,wherein the computer system is programmed with said executable programcode to perform said process.
 15. A data mining method, comprising:storing, in computer storage, item acquisition data of users of anelectronic catalog of items, said item acquisition data includinginformation reflective of timings of item acquisition events;identifying a pair of catalog items, item A and item B, that, based onsaid item acquisition data, have been acquired in the sequence item Afollowed by item B by each of a plurality of said users; anddetermining, based on time intervals between user acquisitions of item Aand item B among said plurality of users, an amount of time userstypically wait to acquire item B after acquiring item A; said methodperformed by a computer system that comprises one or more computers. 16.The data mining method of claim 15, wherein the amount of time isdetermined as a range of time intervals.
 17. The data mining method ofclaim 15, wherein determining the amount of time users typically waitcomprises determining whether a characterizing time interval exists. 18.The data mining method of claim 15, further comprising programmaticallyusing the determined amount of time to select a timing with which torecommend item B to a user who has acquired item A.
 19. The data miningmethod of claim 15, further comprising causing an electronic catalogpage associated with item A to be supplemented with an indication ofsaid amount of time users typically wait to acquire item B afteracquiring item A.
 20. The data mining method of claim 19, furthercomprising calculating, by the computer system, what percentage of userswho have acquired item A have acquired item B after waiting said amountof time, and causing said electronic catalog page to be supplementedwith an indication of said percentage.